C:/Documents and Settings/Michel/Mes documents/projects/clustering/archive/fuzzieee/grira05semi-supervised.dvi

نویسندگان

  • Nizar Grira
  • Michel Crucianu
  • Nozha Boujemaa
چکیده

Traditional clustering algorithms usually rely on a pre-defined similarity measure between unlabelled data to attempt to identify natural classes of items. When compared to what a human expert would provide on the same data, the results obtained may be disappointing if the similarity measure employed by the system is too different from the one a human would use. To obtain clusters fitting user expectations better, we can exploit, in addition to the unlabelled data, some limited form of supervision, such as constraints specifying whether two data items belong to a same cluster or not. The resulting approach is called semi-supervised clustering. In this paper, we put forward a new semi-supervised clustering algorithm, Pairwise-Constrained Competitive Agglomeration: clustering is performed by minimizing a competitive agglomeration cost function with a fuzzy term corresponding to the violation of constraints. We present comparisons performed on a simple benchmark and on an image database.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Apport automatisé de sémantique lors de manipulations de documents géographiques

HAL is a multi-disciplinary open access archive for the deposit and dissemination of scientific research documents, whether they are published or not. The documents may come from teaching and research institutions in France or abroad, or from public or private research centers. L’archive ouverte pluridisciplinaire HAL, est destinée au dépôt et à la diffusion de documents scientifiques de niveau...

متن کامل

Composition of fish communities in macrotidal salt marshes of the Mont Saint-Michel bay (France)

HAL is a multi-disciplinary open access archive for the deposit and dissemination of scientific research documents, whether they are published or not. The documents may come from teaching and research institutions in France or abroad, or from public or private research centers. L’archive ouverte pluridisciplinaire HAL, est destinée au dépôt et à la diffusion de documents scientifiques de niveau...

متن کامل

خوشه‌بندی فراابتکاری اسناد فارسی اِکس‌اِم‌اِل مبتنی بر شباهت ساختاری و محتوایی

Due to the increasing number of documents, XML, effectively organize these documents in order to retrieve useful information from them is essential. A possible solution is performed on the clustering of XML documents in order to discover knowledge. Clustering XML documents is a key issue of how to measure the similarity between XML documents. Conventional clustering of text documents using a do...

متن کامل

Recherche d'information dans les documents numériques : vers une variation des modalités d'exécution procédurale

HAL is a multi-disciplinary open access archive for the deposit and dissemination of scientific research documents, whether they are published or not. The documents may come from teaching and research institutions in France or abroad, or from public or private research centers. L’archive ouverte pluridisciplinaire HAL, est destinée au dépôt et à la diffusion de documents scientifiques de niveau...

متن کامل

خوشه‌بندی اسناد مبتنی بر آنتولوژی و رویکرد فازی

Data mining, also known as knowledge discovery in database, is the process to discover unknown knowledge from a large amount of data. Text mining is to apply data mining techniques to extract knowledge from unstructured text. Text clustering is one of important techniques of text mining, which is the unsupervised classification of similar documents into different groups. The most important step...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2005